
Figure 1. Visualization of the spectrogram of 5 tokens' synthesis results trained on an internal dataset. From top to bottom is 1 to 5.
Figure 2. Visualization of the spectrogram of 5 tokens' synthesis results trained on VCTK dataset. From top to bottom is 1 to 5.

Figure 3. Visualization of the spectrogram of 5 tokens' synthesis results trained on Blizzard2013 dataset. From top to bottom is 1 to 5.
The following shows three example of prosody transfer synthesis.
In each example, text of the utterance to synthesis is the same as the reference's. The first utterance shown in each example is the reference. The second one is the synthesis results using neutral prosody. The third one is the prosody transfer result.
The following shows three example of unparallel prosody transfer synthesis.
In each example, text of the utterance to synthesis is different from the reference's. The first utterance shown in each example is the reference. The second and third ones are two prosody transfer synthesis results with different text contents.